Search CORE

4 research outputs found

Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration

Author: Beaini Dominique
Liò Pietro
Stärk Hannes
Zhao Xiangyu
Zhao Yiren
Publication venue
Publication date: 26/03/2023
Field of study

It has been increasingly demanding to develop reliable methods to evaluate the progress of Graph Neural Network (GNN) research for molecular representation learning. Existing GNN benchmarking methods for molecular representation learning focus on comparing the GNNs' performances on some node/graph classification/regression tasks on certain datasets. However, there lacks a principled, task-agnostic method to directly compare two GNNs. Additionally, most of the existing self-supervised learning works incorporate handcrafted augmentations to the data, which has several severe difficulties to be applied on graphs due to their unique characteristics. To address the aforementioned issues, we propose GraphAC (Graph Adversarial Collaboration) -- a conceptually novel, principled, task-agnostic, and stable framework for evaluating GNNs through contrastive self-supervision. We introduce a novel objective function: the Competitive Barlow Twins, that allow two GNNs to jointly update themselves from direct competitions against each other. GraphAC succeeds in distinguishing GNNs of different expressiveness across various aspects, and has demonstrated to be a principled and reliable GNN evaluation method, without necessitating any augmentations.Comment: 11th International Conference on Learning Representations (ICLR 2023) Machine Learning for Drug Discovery (MLDD) Workshop. 17 pages, 6 figures, 4 table

arXiv.org e-Print Archive

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

Author: Barzilay Regina
Corso Gabriele
Jaakkola Tommi
Jing Bowen
Stärk Hannes
Publication venue
Publication date: 04/10/2022
Field of study

Predicting the binding structure of a small molecule ligand to a protein -- a task known as molecular docking -- is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space. Empirically, DiffDock obtains a 38% top-1 success rate (RMSD<2A) on PDBBind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. Moreover, DiffDock has fast inference times and provides confidence estimates with high selective accuracy.Comment: Under revie

arXiv.org e-Print Archive

DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models

Author: Barzilay Regina
Corso Gabriele
Jaakkola Tommi S.
Ketata Mohamed Amine
Laue Cedrik
Mammadov Ruslan
Marquet Céline
Stärk Hannes
Wu Menghua
Publication venue
Publication date: 07/04/2023
Field of study

Understanding how proteins structurally interact is crucial to modern biology, with applications in drug discovery and protein design. Recent machine learning methods have formulated protein-small molecule docking as a generative problem with significant performance boosts over both traditional and deep learning baselines. In this work, we propose a similar approach for rigid protein-protein docking: DiffDock-PP is a diffusion generative model that learns to translate and rotate unbound protein structures into their bound conformations. We achieve state-of-the-art performance on DIPS with a median C-RMSD of 4.85, outperforming all considered baselines. Additionally, DiffDock-PP is faster than all search-based methods and generates reliable confidence estimates for its predictions. Our code is publicly available at

\texttt{https://github.com/ketatam/DiffDock-PP}

Comment: ICLR Machine Learning for Drug Discovery (MLDD) Workshop 202

arXiv.org e-Print Archive

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Author: Adams Keir
Anandkumar Anima
Aspuru-Guzik Alán
Azizzadenesheli Kamyar
Barzilay Regina
Bekkers Erik
Bohde Montgomery
Bronstein Michael
Coley Connor W.
Daigavane Ameya
Du Yuanqi
Edwards Carl
Ermon Stefano
Fang Ada
Fu Cong
Fu Tianfan
Fu Xiang
Gao Nicholas
Gui Shurui
Günnemann Stephan
Helwig Jacob
Hofgard Elyssa F.
Huang Qian
Jaakkola Tommi
Ji Heng
Ji Shuiwang
Joshi Chaitanya K.
Kurtin Jerry
Ladera Adriana
Lawrence Hannah
Leskovec Jure
Li Xiner
Lin Yuchao
Ling Hongyi
Liu Meng
Liu Yi
Liò Pietro
Luo Youzhi
Mathis Simon V.
Phung Tuong
Qian Xiaofeng
Qian Xiaoning
Saxton Alexandra
Smidt Tess
Strasser Alex
Stärk Hannes
Sun Jimeng
Tehrani Aria Mansouri
Wang Limei
Wang Rui
Wang Yucheng
Weiler Maurice
Wu Tailin
Xie Yaochen
Xie YuQing
Xu Minkai
Xu Shenglong
Xu Zhao
Yan Keqiang
Yu Haiyang
Yu Rose
Zhang Xuan
Zitnik Marinka
Publication venue
Publication date: 15/11/2023
Field of study

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

arXiv.org e-Print Archive